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I.  . Introduction 

In  many  algorithms,  a  Euclidean  inner  product  of  two  vectors  must 
be  ccmputed  with  greater  precision  than  the  rest  of  the  calculations. 

An  exan5)le  is  the  calculation  of  the  residual  vector 

r  =  b  -  Ax  (1) 

used  in  an  algorithm  for  improving  an  approximate  solution  x  of 
the  lineP'*'  system 

Ax  =  b  . 

When  the  inner  product  occurs  in  an  algorithm  coded  in  short 
precision,  it  is  usually  sufficient  to  accumulate  it  in  long  precision 
(double  precision) .  Long-precision  arithmetic  is  a  hardware  feature  of 
many  machines;  if  so,  the  inner  product  is  easily  coded  and  quickly 
executed.  However,  when  long^precision  arithmetic  is  not  available,  or  j 

when  the  entire  algorithm  is  coded  in  long  precision,  the  inner  product  ‘ 

routine  beccanes  more  difficult  to  code  and  execution  time  may  become 

I 

excessive. 

This  report  is  primarily  concerned  with  existing  routines  for 
evaluating  inner  products  using  more  precision  than  long,  for  use  within 
long-precision  programs  for  the  IBM  Sy8tem/560.  Several  such  subroutines 
can  be  called  from  Fortran  H  programs;  one  is  available  for  Watfor  (or 
Watfiv)  Fortran  programs  and  one  for  Algol  W. 

( 

] 
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II.  Algol  W 

The  double  precision  inner  product  routine  available  for  Algol  W 
programs  is  the 

long  real  procedure  ip2  (integer  i;  integer  value  t,  s,  u; 

long  real  a,  b;  long  real  value  c) ; 
comment  This  procedure  computes  the  sum  of  products  ayb  and 

adds  it  to  the  extra  term  c.  The  bound  vaxdable  i  is  used 
to  indicate  the  subscript  in  the  cwnponents  of  the  vectors 
a  and  b  over  which  the  scalarproduct  is  foimed.  Although  the 
procedure  body  is  more  complicated,  it  can  be  illustrated  as 
follows : 

begin  long  real  sum,  sum  :=  O.OL, 

for  i  :=  i  step  s  \intil  u  ^  sum  :=  sum  +  a*b, 
sum  +  c 


Jense.i*s  device  is  used  through  the  bound  variable  i  .  For  example, 
ip2  could  be  used  to  compute  the  vector  r  in  Equation  (1)  as  follows: 

for  i  :=  1  step  1  \intil  n  ^ 

r(i)  :=  -ip2(k,l,l,n,A(i,k),x(k),-b(i)) 

Since  each  product  has  28  significant  hex  digits  and  a  double  word  has 
only  Ih  digits,  a  technique  related  to  that  suggested  by  Miller  [1965] 
is  used  to  retain  full  significance.  For  illustrative  purposes,  consider 
the  following  segment  of  an  Algol  W  program: 

real  t;  long  real  a,  al,  a2,  b,  bl,  b2,  b5; 

• 

comment  a  and  b  have  been  assigned  double  precision  values; 
t  :=  a;  al  :=  t;  a2  :=  a-al; 
t  :=b;  bl  :=t;  b3  :=b  -  bl; 
t  :=  b5;  b2  :=  t;  b5  :=  b5  -b2; 
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The  above  program  segment  splits  the  numbers  a  and  b  so  that 

a  =  al+  a2 
b  =  bl  +  b2  +  b3  . 


Thus 

axb  =  (al+a2)  x  (bl+b2  +  b3) 

=  al*bl  +  al*  (b2  +  b3)  +  a2*bl  +  a2*b2  +  a2*b3  ( 2) 

where  *  indicates  double-precision  floating-point  multiplication  and 
the  symbols  x  >  +  and  =  have  the  usual  mathematical  interpretation. 

The  terms  of  Equation  (2)  are  accumulated  using  a  technique 
suggested  by  Malcolm  [I97O].  It  follows  directly  from  Theorem  2  in 

A 

Malcolm  [1970]' that  provided  n  <  15107  ,  the  result  (|)  calculated  by 
lp2  satisfies 

l=l(l+e)  (5) 

where 

je]  <  1+*16’^ 

and  I  is  the  exact  result.  The  procedure  can  be  easily  modified  to 
accommodate  n  >  13107  and  still  satisfy  Equation  (5)  • 

The  parameters  i  ,  a  and  b  are  passed  by  name  to  give  maximum 
generality.  One  may  wish  to  modify-  this  to  economize  on  execution  time. 
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III.  Wat  for  (or  Watfiv)  Fortran 


The  same  techniques  used  in  ip2  are  implemented  in  two  Fortran 
subroutines :  DPRrT(A,B)  and  ITTOTL(S)  .  The  ceQl: 

CALL  DPPUT(A,B) 

adds  the  product  A  x  B  (A  and  B  are  double  precision)  to  the 
accumulators.  The  call: 

CALL  IPTOTL(S) 

sums  the  accumuilators  and  assigns  the  long  precision  result  to  S  .  The 
subroutine  IPTOTL  leaves  the  accxmiulators  in  their  initial  state  (all 
zero) . 

The  result  S  (=  ?)  satisfies  (3)  provided  DPHJT  has  not  been 
called  more  than  13^107  times  since  the  accumulators  were  last  initialized 
DPHJT  and  IPTOTL  use  a  named  common  area  called  DPACCC  for  storing 
the  accumulators.  A  BLOCK  DATA  subprogram  is  used  for  initializing  the 
named  common  data  area. 

Following  is  an  example  using  DPHJT  and  IPTOTL  to  calculate  the 
r  vector  in  Equation  (1) . 

D0  10  I  =  1,W 
D0  5  J  =  I7N 

5  CALL  DPHJT(-A(I,J),X(J)) 

CALL  DPHJT(B(I),1-0D0) 

10  CALL  IPT0TL(R(l)) 

IV.  Fortran  H 

Several  efficient  subroutines  can  be  called  by  a  Fortran  H  program 
for  computing  double-plus  inner  products. 
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YPR2  is  a  subroutine  written  by  Ehrman  [I967]  that  forms  the 
double-long  product  of  two  double  precision  arguments  and  adds  it  to  a 
double-long  sum.  For  example,  VPR2  could  be  used  for  computing  the  r 
vector  of  Equation  (l)  as  follows: 

REAL*8  R1(2),A(N),B(N),X(N),R(N) 

INTEGER  IE5CP 


• 

D0  10  I  =  1,N 
lEXP  =  0 
Rl(l)  =  O.ODO 
Rl(2)  =  O.ODO 
D0  5  J  =  liN 

5  CALL  VPR2(-A(I,J),X(J),R1(1),IE3CP) 
CALL  VPR2(1.0D0,B(l),Rl(l),IEn') 

IF  (lEXP  NE.O)  T0  100 
10  R(I)  =  Rl(l) 


100  (write  error  message  and/or  teiminate} 


In  the  above  example,  R1  is  an  accumulator  with  50  hex  digits  (two  double 
words  with  the  exponent)  and  lEXP  is  used  as  an  Indication  of  underflow  or 
overflow. 

Although  VPR2  uses  a  50  hex  digit  accumulator,  it  can  still  result 
in  a  large  relatl/e  error.  Examples  can  be  constructed  that  result  in  no 
significant  digits.  However,  practical  algorithms  in  which  this  phenomenon 
causes  an  unacceptable  loss  of  precision  are  probably  rare. 

All  calculations  in  VPR2  are  performed  in  the  "general  registers". 
Althou^  TR2  requires  a  subroutine  linkage  for  each  term  of  the  inner 
product,  execution  times  compare  favorably  with  the  fastest  routines. 


B.  DPRJT  and  IPTOTL 

The  routines  described  in  Part  III  for  use  in  Watfor  are  available 
in  more  efficient  versions  coded  in  PL560  for  use  with  Fortran  H.  The 
PL560  versions  of  DPPUT  and  IPTOTL  differ  from  the  Fortran  versions  in 
that  full  precision  accuracy  is  obtained  and  the  result  is  correctly 
rounded.  This  is  achieved  by  a  technique  described  in  Section  V  of 
Malcolm  [1970].  Also,  the  result  has  full  precision  accuracy  and  is 
correctly  rounded. 

C .  DPDOTP 

DPDOTP  is  a  PL36o  function  subroutine  which  uses  the  same  techniques 
as  DPRJT  and  IPT^L  described  above.  The  function  call  for  DPDOTP  has  a 
variable  length  parameter  list.  The  full  formal  parameter  list  is: 

DH)0TP(  A,  B,  N,  XTERM,  INCA,  INCB,  PVA,  PVB) 

where 

A,B  —  The  locations  of  the  first  con^onents  of  the  long -precision 
vectors  to  be  multiplied 

N  —  Tlie  number  of  terms  entering  the  inner  product 

XTERM  —  An  extra  double  precision  term  to  be  added  to  the  inner 
product  (optional) 

INCA  —  Number  of  (double)  words  separating  successive  elements  of 
the  vector'  A  (optional) 

INCB  —  Nvnnber  of  (double)  words  separating  successive  elements  of 
the  vector  B  (optional) 
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PVA  —  Integer  vector  specifying  a  pennutation  of  the  elanents 


of  the  vector  A  (optional) 

PVB  —  Integer  vector  specifying  a  permutation  of  the  elements 
of  the  vector  B  (optional) 

In  the  actual  parameter  list,  only  the  first  three  parameters  (A  ,  B 
and  N)  are  required.  Default  values  of  the  remaining  peurameters  are: 


XTERM 

=  O.ODO 

INCA 

=  1 

INCB 

=  1 

PVA(I) 

=  I 

(I 

II 

ro 

• 

• 

• 

PVB(I) 

=  I 

(I 

-  1,2,...) 

For  illustrative  purposes  assume  the  foUcwlng  declarations 

HEAL*8  DH)OTP,A(N,N),B(N),C(N),SUM,R(N),X(N) 

INTEGER*^  PA(N) 

Note  that  DIDOTP  must  be  declared  as  a  long-precision  floating-point 
variable.  A  statement  vhich  sets  SUM  to  the  inner  product  of  the  vectors 
B  and  C  is 

SUM  =  DEDOTP(B,C,N) 

Another  example  is  the  calculation  of  the  residual  vector  in  Equation  (l) 
D^  10  I  =  1,N 

10  R(I)  =  -DPDjdTP(A,X,N,-B(l),N) 

In  this  example,  INCA  must  be  N  because  Fortrw  stores  the  array  A 
in  column  order  (see  the  Fortran  IV(H)  Programmer's  Guide)  which  means 
neighboring  elements  In  a  given  row  of  A  are  separated  by  N  double 
words.  If  the  columns  of  A  ,  in  the  above  example,  were  permuted  as 


specified  by  the  integer  vector  PA  ,  the  calcxilation  of  the  residual 
vector  would  then  be  as  follows : 

D0  10  I  =  1,N 

10  R(I)  =  "DHDjjTP(A,X,N,-B(l),N,l,PA) 

A  PL560  single  precision  function  subroutine  for  calculating  the  exact 
rounded  inner  product  of  single  precision  vectors  is  also  available.  This 
routine,  called  SPDOTP,  has  the  same  calling  sequence  as  DH)0TP. 


D.  DOTP 


DOTP  is  an  Assembler  Language  function  subroutine  written  at 
Argonne  National  Laboratories  (see  Jordan  [I967]).  'Ihe  formal  i>arameter 
list  is 

1)0TP(A,  B,N) 

where 

A,  B  —  The  locations  of  the  first  conponents  of  the  vectors  to 
be  multiplied 

N  —  The  number  of  terns  entering  the  inner  product 

For  exasgple,  the  residual  vector  in  Equation  (1)  could  be  calculated  as 
follows : 

REAL*8  D^P, A(N,N)  ,X(N)  ,B(N)  ,R(N)  ,TEMP(N) 

• 

10  I  =  1,N  * 

5  J  =  1>N 

5  TQ{P(J)  =  A(I,J) 

10  R(I)  =  B(I)  -D0TP(TiJlP,X,N) 

Note  that  DOTP  must  be  declared  as  a  long-precision  variable. 
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DOTP  uses  the  same  techniques  as  DH)OTF  (l.e.,  splitting  the 
ox>erands  and  32  accumulators) ;  hovever^  DOTP  does  a  number  of  internal 
subroutine  linkages  (proportional  to  N)  to  code  that  is  in  line  in 
DEDOTP. 


rlson  of  Execution  !f  lmes 


Each  of  the  routines  descrlT^ed  abrve  has  tmdergone  extensive 
tests  to  Insure  accuracy.  In  addition  to  these  tests,  each  routine 
was  timed  on  the  360/67  with  the  following  two  calculations: 

N 

Test  No.  1:  E  x  b. 

k=l  ^ 


Test  No.  2: 


Each  factor  a^j^  ,  aj^  ,  b^^  entering  the  inner  product  for  these  teste 
was  eq\ial  to  3.1^15926535097952  . 

The  e:q>erimental  results  are  tabulated  in  Table  I  in  terms  of  values 
of  K  for  determining  execution  time  according  to 

execution  time  =  K  x  N 
in  milliseconds. 

The  i>epple  who  programmed  the  various  routines  are  acknowledged 


in  Table  I. 


TABLE  I 


Values  of  K  for 


execution  time  =  Kxno.  of  terms  in  inner  product  (ms) 


tJ 


Calling 

Language 

Inner 

Product 

Routine 

Inner 

Product 

Compiler 

Programmer 

K 

for 

K 

for 

k 

Algol  W 

•ip2 

Algol  W 
(w/o  $N0CHECK) 

Michael 

Saunders 

0.710 

0.705 

Algol  W 

ip2 

Algol  W 

(with  $N0CHECK) 

Michael 

Saunders 

0.5^4 

0.526 

Fortran 

DPHJT 

IPTOTL 

Watfiv 
(w/o  NCTHBCK) 

Gordon 

GuUahom 

2.12 

2.05 

Fortran 

DPHJT 

IPTOTL 

Watfiv 

(with  NOCHECK) 

Gordon 

GuUahom 

2.U 

2.06 

Fortran 

DPHJT 

IPTOTL 

Fortran  H 
opt  =  0 

Gordon 

GuUahom 

0.424 

0.421 

Fortran 

DPHJT 

IPTOTL 

Fortran  H 
opt  =  2 

Gordon 

GuUahom 

0.552 

0.552 

Fortreui 

DPHJT 

IPTOTL 

PL560 

Michael 

Malcolm 

0.212 

0.210 

Fortran 

DPDOTP 

PL56o 

Michael 

Malcolm 

0.184 

0.184 

Fortran 

VPR2 

OS/Assembler 

John  Ehrman 

0.196 

0.196 

Fortran 

DOTP 

OS/Assembler 

D.  Jordan 

0.242 

0.218 

tJ 


All  tests  were  performed  on  an  IBM  560/67. 
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VI.  C one luB Iona 

Many  long-preclslon  routines  requiring  accurate  inner  products 
can  be  coded  in  either  Fortran  or  Algol  W.  For  Fortran,  DPHJT  and  IFFOTL 
are  probably  the  most  useful  for  three  reasons:  (l)  they  are  easy  to  use 
and  fast;  (2)  accuracy  of  the  result  is  guaranteed;  and  (?)  programs 
using  then  can  be  debiigged  and  run  with  the  Wat  for  (or  Watflv)  compiler. 
For  programs  vhlch  are  to  be  debugged  and  run  with  the  Fortran  H  compiler, 
DFDOTF  is  probably  the  best  because  it  is  easy  to  use,  execution  time  is 
minimal  and  the  result  is  guaranteed. 
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